Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC][dagster-powerbi] Implement copy_with_context in DagsterPowerBITranslator #26617

Conversation

maximearmstrong
Copy link
Contributor

@maximearmstrong maximearmstrong commented Dec 20, 2024

Summary & Motivation

This PR is motivated by this discussion

Context:

Translators for BI integrations required the contextual data from the workspace to create the asset specs. To pass this data to the translator, we ask users to pass the type of the translator to the spec loader, and we initialize the translator with its context in the state-back defs.

This approach works well, but is not flexible for users that want to customize their translators - users often customize the __init__ method, but passing this approach prevent them to initialize the translator with the correct arguments.

We want to update BI integrations so that users can pass an instance of the translator to spec loaders, which is more flexible, instead of the type.

Challenges:

  • the translator object should to be immutable
  • users can customize __init__ method and add new parameters to its signature, which are unknown to us
  • once a context is set in a translator instance, it should not change
  • keep this as simple as possible for users

Solution:

Use the copy_with_context method to rebuild a translator with its context.

To make this possible we need to:

  • infer the __init__ kwargs from the attributes and the __init__ signature, using _get_init_kwargs_from_instance
  • have matching attributes and params in the translator
  • have the context as a parameter in the __init__ method of a custom translator

So, to make this possible, we require a few things from users:

  • the signature of the custom __init__ method must include the context parameter
  • Every attribute or private attribute must match their __init__ param, eg. the attribute for my_param can be my_param or _my_param
class MyCustomTranslator(DagsterPowerBITranslator):
    def __init__(self, my_param: str, context: Optional[PowerBIWorkspaceData] = None):
        self.my_param = my_param
        super().__init__(context=context)

    def get_asset_spec(self, data: PowerBIContentData) -> AssetSpec:
        default_spec = super().get_asset_spec(data)
        return default_spec.merge_attributes(
            metadata={"custom": self.my_param},
        )
 
# user code
translator = MyCustomTranslator(my_param=test_param)

# internal code
translator.copy_with_context(
    context=context
)
asset_spec = translator.get_asset_spec(...)

Proposals that were not retained:

  1. Pass a callable factory to construct a translator, see here - this complexifies the user code
  2. Pass the context to get_asset_spec, see here - this is breaking change that we would like to avoid.
  3. Set the context on a translator after it’s initialization, see here - that makes the object mutable

How I Tested These Changes

New tests with BK

Changelog

Insert changelog entry or delete this section.

Copy link
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@@ -437,7 +437,7 @@ def fetch_state(self) -> PowerBIWorkspaceData:
)

def defs_from_state(self, state: PowerBIWorkspaceData) -> Definitions:
translator = self.translator_cls(context=state)
translator = self.translator_cls().copy_with_context(context=state)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

State-backed defs and spec loader to accept a translator instance in a subsequent PR.

@maximearmstrong maximearmstrong changed the title [dagster-powerbi] Implement copy_with_context in DagsterPowerBITranslator [RFC][dagster-powerbi] Implement copy_with_context in DagsterPowerBITranslator Dec 20, 2024
@maximearmstrong maximearmstrong marked this pull request as ready for review December 20, 2024 15:50
@maximearmstrong maximearmstrong self-assigned this Dec 20, 2024
Copy link
Contributor

@OwenKephart OwenKephart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on board with all the stuff about passing an instance of the translator etc.

However, I think it's probably too sketchy to try to infer the init arguments and reconstruct the class etc.

It'd work for a lot of cases but I think it's not intuitive for the user that their custom init function would get called twice, and I don't have high confidence that there isn't some sneaky way of breaking this with fancy parameter specs (like **kwargs etc.)

We usually avoid mutable objects (for good reason), but I think this is a prime example of a place where that'd be a better solution.

We could just have:

class DagsterPowerBITranslator:
    _context: Optional[PowerBiWorkspaceData]

    def set_context(self, context: PowerBiWorkspaceData) -> None:
        self._context = context

    @property
    def context(self) -> PowerBIWorkspaceData:
        return check.not_none(self._context, "helpful message explaining you probbably called a method outside of framework code")

This lets us avoid all the fancy inspect stuff and is much easier to wrap your head around (imo)

@maximearmstrong
Copy link
Contributor Author

maximearmstrong commented Dec 20, 2024

However, I think it's probably too sketchy to try to infer the init arguments and reconstruct the class etc.

It'd work for a lot of cases but I think it's not intuitive for the user that their custom init function would get called twice, and I don't have high confidence that there isn't some sneaky way of breaking this with fancy parameter specs (like **kwargs etc.)

We usually avoid mutable objects (for good reason), but I think this is a prime example of a place where that'd be a better solution.

@OwenKephart I agree with you 100%.

I just updated the description of this PR to add proposals that were not retained. Your proposal was previously suggested, but not retained because it would make the object mutable.

That said, I agree that inferring the init arguments from the instance is error-prone.

@schrockn Considering the implementation it requires to make the object immutable, do you think it would be a reasonable trade-off to use a setter and make this object mutable? I think there'd be a lot less risk of error.

@maximearmstrong
Copy link
Contributor Author

Closing this RFC - we found an alternative, see here.

maximearmstrong added a commit that referenced this pull request Jan 2, 2025
…to PowerBITranslatorData (#26654)

## Summary & Motivation

This PR implements `PowerBITranslatorData`, a container class containing
a `PowerBIContentData` object (props) and a `PowerBIWorkspaceData`
object (context), to pass contextual data to the translator without
breaking the `get_asset_spec` signature or requiring context to be
passed to the `__init__` method of the `DagsterPowerBITranslator`.

This is PR is an alternative to the proposals mentioned in
#26617.

## How I Tested These Changes

Updated tests with BK

## Changelog

[dagster-powerbi] Type hints in the signature of
`DagsterPowerBITranslator.get_asset_spec` have been updated - the
parameter `data` is now of type `PowerBITranslatorData` instead of
`PowerBIContentData`. Custom Power BI translators should be updated.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants